GATE PYQ

Pipeline Processor

Q21.
Data forwarding techniques can be used to speed up the operation in presence of data dependencies. Consider the following replacements of LHS with RHS. i. R1\rightarrow Loc, Loc\rightarrow R2 \; \equiv \; R1\rightarrow R2, R1 \rightarrow Loc ii. R1\rightarrow Loc, Loc\rightarrow R2 \; \equiv \; R1\rightarrow R2 iii. R1\rightarrow Loc, R2 \rightarrow Loc \; \equiv \; R1\rightarrow Loc iv. R1\rightarrow Loc, R2 \rightarrow Loc \; \equiv \; R2\rightarrow Loc In which of the following options, will the result of executing the RHS be the same as executing the LHS irrespective of the instructions that follow ?

Q22.
Consider a 3GHz (gigahertz) processor with a three-stage pipeline and stage latencies \tau _{1}, \tau _{2}, and \tau _{3} such that \tau _{1}=3\tau _{2}/4=2\tau _{3}. If the longest pipeline stage is split into two pipeline stages of equal latency, the new frequency is _____GHz, ignoring delays in the pipeline registers.

Q23.
A processor takes 12 cycles to complete an instruction I. The corresponding pipelined processor uses 6 stages with the execution times of 3, 2, 5, 4, 6 and 2 cycles respectively. What is the asymptotic speedup assuming that a very large number of instructions are to be executed?

Q24.
Delayed branching can help in the handling of control hazards For all delayed conditional branch instructions, irrespective of whether the condition evaluates to true or false

Q25.
A pipeline P operating at 400 MHz has a speedup factor of 6 and operating at 70% efficiency. How many stages are there in the pipeline?

Q26.
The floating point unit of a processor using a design D takes 2t cycles compared to t cycles taken by the fixed point unit. There are two more design suggestions D_1 and D_2. D_1 uses 30% more cycles for fixed point unit but 30% less cycles for floating point unit as compared to design D. D_2 uses 40% less cycles for fixed point unit but 10% more cycles for floating point unit as compared to design D. For a given program which has 80% fixed point operations and 20% floating point operations, which of the following ordering reflects the relative performances of three designs? (D_i > D_j denotes that D_i is faster than D_j)

Q27.
Consider an instruction pipeline with five stages without any branch prediction: Fetch Instruction (FI), Decode Instruction (DI), Fetch Operand (FO), Execute Instruction (EI) and Write Operand (WO). The stage delays for FI, DI, FO, EI and WO are 5 ns, 7 ns, 10 ns, 8 ns and 6 ns, respectively. There are intermediate storage buffers after each stage and the delay of each buffer is 1 ns. A program consisting of 12 instructions I1, I2, I3,..., I12 is executed in this pipelined processor. Instruction I4 is the only branch instruction and its branch target is I9. If the branch is taken during the execution of this program, the time (in ns) needed to complete the program is

Q28.
Consider an instruction pipeline with four stages (S1, S2, S3 and S4) each with combinational circuit only. The pipeline registers are required between each stage and at the end of the last stage. Delays for the stages and for the pipeline registers are as given in the figure. What is the approximate speed up of the pipeline in steady state under ideal conditions when compared to the corresponding non-pipeline implementation?

Q29.
Consider a pipelined processor with the following four stages: IF: Instruction Fetch ID: Instruction Decode and Operand Fetch EX: Execute WB: Write Back The IF, ID and WB stages take one clock cycle each to complete the operation. The number of clock cycles for the EX stage depends on the instruction. The ADD and SUB instructions need 1 clock cycle and the MUL instruction needs 3 clock cycles in the EX stage. Operand forwarding is used in the pipelined processor. What is the number of clock cycles taken to complete the following sequence of instructions? \begin{array}{lllll} \textbf{ADD} & \text{R2, R1, R0} &&& \text{R2 $\leftarrow$ R1+R0} \\ \textbf{MUL} & \text{R4, R3, R2} &&& \text{R4 $\leftarrow$ R3*R2} \\ \textbf{SUB} & \text{R6, R5, R4} &&& \text{R6 $\leftarrow$ R5-R4} \\ \end{array}

Q30.
We have two designs D1 and D2 for a synchronous pipeline processor. D1 has 5 pipeline stages with execution times of 3 nsec, 2 nsec, 4 nsec, 2 nsec and 3 nsec while the design D2 has 8 pipeline stages each with 2 nsec execution time How much time can be saved using design D2 over design D1 for executing 100 instructions?